Experimental evaluation of Italian language models for large-dictionary speech recognition
نویسندگان
چکیده
This paper reports on experiments performed on the ltalian language in order to assess the efficiency of probabilistic language models with reference to a task of large-dictionary speech recognition. Two different types of models, an M -gram and an Mg-gram one, have been investigated for comparison purposes. The quality of the models trained on a corpus of 3.5 million words was measured in terms · of perplexity and of the improvement achieved by integrating the language model in real speech recognition systems. Judging from this empirical measurement, the two language models exhibit equivalent preformance for ltalian, although perplexity measurements would suggest otherwise.
منابع مشابه
A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملAdaptation of Pronunciation Dictionaries for Recognition of Unseen Languages
This paper studies the relative effectiveness of different methods for multilingual model combination and dictionary mapping for recognizing a new unseen target language if training data are limited. We examine the crosslanguage transfer from monolingual and multilingual models to German and Russian language for large vocabulary speech recognition using a dictation database which has been colle...
متن کاملA Large Vocabulary Continuous Speech Recognition System for Indonesian Language
This paper presents our work to build a pioneering Indonesian Large Vocabulary Continuous Speech Recognition (LVCSR) System. In order to build an LVCSR system, high accurate acoustic models and large-scale language models are essential. Since Indonesian speech corpus was not available yet, we tried to collect speech data from Indonesian native speakers to construct a speech corpus for training ...
متن کاملFinite-state Transducer Base with Explicit Modeling of Ph
This article describes the design and the experimental evaluation of the first Hungarian large vocabulary continuous speech recognition (LVCSR) system. The architecture of the recognition system is based on the recently proposed weighted finite state transducer (WFST) paradigm. The task domain is the recognition of fluently read sentences selected from a major daily newspaper. Recognition perfo...
متن کاملAutomatic Clinical Speech Recognition for CLEF 2015 eHealth Challenge
In this working notes report/paper, we describe the details of two submissions for CLEF 2015 eHealth challenge for Task 1a, with details of methods and tools developed for automatic speech recognition of NICTA synthetic nursing handover dataset. The first method involves a novel zero-resource approach based on unsupervised acoustic only modeling of speech involving word discovery, and the secon...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1987